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Abstract. Although commonly employed by X-ray astronomers, maximum likelihood estimators are known to be biased. In 
this paper we investigate the bias associated to the measure of the temperature from an X-ray thermal spectrum. We show that, 
in the case of low surface brightness regions, commonly adopted estimators, such as those based on x' and Cash statistics, 
return strongly biased results. We stress that this can have strong implications when measuring the temperature of cluster outer 
regions with current experiments. We consider various approaches to overcome this problem, the most effective is a technique 
which allows us to correct the bias a posteriori. Extensive montecarlo simulations show that our correction returns excellent 
results under different conditions. 
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1. Introduction 

With the advent of XMM-Newton and Chandra it has become 
possible to explore to a certain extent the physical properties of 
the intra-cluster medium (ICM) in the outer regions of galaxy 
clusters. In dealing with these regions, there are both statistical 
and systematic issues which need to be addressed: typically, 
the spectra have poor statistic (i.e. few counts/bin) and a high 
background, especially at high energies, where, also due to the 
sharp decrease of the effective area of the experiments, the in- 
strumental background dominates over other components. In 
this paper we employ simulations to examine how best to an- 
alyze this kind of spectra. Here we focus on the treatment of 
statistical errors and neglect syst ematic ones, which will b e 
discussed in a forthcoming paper jLeccardi & MolendilEoOTi) . 
More specifically the question we wish to address is the fol- 
lowing; "What are the effects of pure statistical uncertainties in 
determining interesting parameters of highly non linear models 
(e.g. the temperature of the ICM), when we analyze spectra ac- 
cumulated from low surface brightness regions using current 
X-ray experiments?" To address this question, we perform a 
set of simulations: first, we choose the input values for model 
parameters and produce the expected spectrum; then, we gen- 
erate a large number of perturbed spectra representing a large 
set of measurements; finally, we analyze them with different 
techniques based on the method of maximum-likelihood (here- 
after ML) and compare the results. Our choices of simulation 
parameters (e.g. spectral model, energy band, fixed parameters, 
etc.) are justified by our practical issue, i.e. the determination 
of the temperature in the outer regions of massive galaxy clus- 



ters. Our analysis is mainly focused on XMM-Newton, however 
most of our results are valid in all cases when analyzing low 
count Poisson-distributed data. 

From a more general perspective, ours may be viewed as 
an attempt to quantify the significance of the bias of ML esti- 
mators commonly adopted by X-ray astronomers to determine 
spectral parameters. As we shall see, the most common ML es- 
timators, indeed all those available within XSPECQ, are charac- 
terized by a substantial bias when applied to our specific case. 
A long term solution to the problem requires an unbiased, or 
perhaps a less biased, estimator to be found and implemented 
within standard fitting packages (i.e. XSPEC). Another, faster, 
solution involves correcting the bias a posteriori making use of 
extensive montecarlo simulations. 

The outline of the paper is the following. In ^we consider 
the idealized source only case. In ^we include the background 
considering two cases: in the first the source contribution is 
much more important than the background one, in the second 
the opposite is true. In ^we try to correct the bias: in ^4.1l we 
search for a long term solution of the problem using different 
estimators; in ^4. 21 and ^4.3l we correct the bias a posteriori, in 
particular in ^4.3l we explain how to use our ad-hoc recipe (i.e. 
the so-called "triplet" method). In ^we summarize our main 
results. 

Errors are quoted at one sigma for one interesting parame- 
ter, unless otherwise stated. 



We used XSPEC 11.3 (http://heasarc.nasa.gOv/docs/xanadu/xspec/xspecll/index.h 
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2. The source only case 

In this section we deal with the idealized source only case. 
We represent the source with an absorbed thermal model 
(WABS*MEKAL in XSPEC). The parameter values are the follow- 
ing: the equivalent hydrogen column density along the line of 
sight, A^H, is 2.5 x 10^" cm"^; the metallicity, Z, and the redshift, 
z, are respectively 0.25 solar and 0.2; the temperature, kT, is 
7 keV and the normalization, A^s, is 3.5 x 10"^ in XSPEC units. 
Our redistribution matrix (RMF) and effective area (ARF) have 
been produced from the observation number 0093030101 of 
the galaxy cluster Abell 1689 with the EPIC-MOSl instru- 
ment; the angular size of the region for which we accumulate 
the simulated spectra is about 4 arcmin^, which corresponds to 
the ring between 1.0' and 1.5' centered on the cluster emission 
peak. The exposure times considered are 5 ks, 10 ks, 100 ks 
and 1 000 ks and the total counts in the 2.0-10.0 keV band are 
respectively about 600, 1200, 12000 and 120000. For each 
channel we perturb the number of counts with a Poisson dis- 
tribution centered on the expected value. We repeat this step 
for A^meas timcs (with A^meas Very large) to obtain A^meas spectra, 
which simulate A^meas independent measurements of the source. 

We fit simulated spectra using the;^f^ and the Cash statistics, 
where the latter is more s uitable to analyze spectra with few 
count s per channe l (Cash 19791; iNousek & Shuel[l989llMighell 
19991; lArzner et al.i2006 ). We recall that each measurement can 
be represented by the number of counts, (9i, observed in each 
channel / (i - 1, A^ where A^ is the number of channels). The 
probability, Q, of obtaining this particular measurement (i.e. 
this particular spectrum) is the product of Poisson distributions 
and can be expressed as a function of the expected counts, E\, 
which depenco on the particular set of model parameters, a 
(e.g. in this case a - (Nh, kT, Z, z, Ns))'- 



Ei^^i e 



Oil 



(1) 



The associated log-ML function C jCash 1 19791) is defined as 
foUows: 



C{a) = -2 ln2(Q') = -2 ^ (Oi Infi; - - InOi!) 

1=1 



(2) 



The best set of parameters is determined by maximizing Q (i.e. 
minimizing C) with respect to a. Conversely, the statistic is 
based on the hypothesis that each spectral bin contains a suf- 
ficient number of counts that the deviations of the Oi from the 
Ei have a Gaussian distribution. This hypothesis is satisfied for 
large Oi, when Q can be approximated by a product of Gaussian 
distributions and the associated log-ML function is defined 
as follows: 



X' 



(Oi - EiY 



(3) 



where o"; is usually the uncertainty in the i-th bin (cr; - Oi'^^). 
The larger Oi, the better the approximation of Gaussian regime. 



Table 1. Weighted averages of temperature best fit values com- 
pared to the input value and relative dififerences AT/Tq, using 
different exposure times and statistics. 











Cash 




Exp. 






AT /To 


kT" 


at/To 


1000 


7.00 


6.89±0.01 


-1.6% 


7.00±0.01 


+0.0% 


100 


7.00 


6.83±0.01 


-2.4% 


7.03+0.01 


+0.4% 


10 


7.00 


6.76±0.03 


-3.4% 


6.91±0.02 


-1.3% 


5 


7.00 


6.59±0.04 


-5.9% 


6.81±0.03 


-2.7% 



^ In the following equations the dependency of Ei from a is omitted 
for clarity. 



Notes: " exposure time in kiloseconds; ^ input temperature in keV; 
measured temperature in keV; relative difference. 



Channel grouping is a widely used strategy that allows to re- 
duce the bias introduced by this approximation. We group 
channels in order to have at least 25 counts per bin, which is 
a commonly adopted compromise. Conversely, when using the 
Cash statistic we perform a minimal grouping to avoid chan- 
nels with no counts, i.e. the spectrum is substantially unbinned 
and no spectral information is lost. Each spectrum is fitted be- 
tween 2.0 and 10.0 keV (the energy band we are interested in) 
with the absorbed thermal model mentioned above. The A^h is 
fixed to the input value (typical values of A^h for cluster ob- 
servations have negligible effects above 2 keV), z is allowed to 
vary between 0.186 and 0.214 (+7% of the input value), while 
kT, Z and A^s are free. We determine best fit values and one 
sigma uncertainties for all parameters. 

In Table [T] we compare the weighted average of the A^meas 
measured temperatures to the input value, kTo = 7 keV, for dif- 
ferent exposure times and statistics. A^meas is chosen in order 
to have similar uncertainties on the average (A^meas = 1 200 for 
5 and 10 ks, A^„eas = 300 for 100 and 1 000 ks). In almost all 
cases, the true temperature is underestimated by a few percent 
and the effect becomes more evident for shorter exposure times. 
We recall that both;^^^ and Cash statistics are based on the ML 
method. Although X-ray astronomers make extensive use of 
ML e stimators, it is well known from the literature (e.g. lCowan 
19981) that: 1) ML estimators may be biased, i.e. the expecta- 
tion value may be different from the true value of the quantity to 
estimate; 2) ML estimators are usually gaussian and unbiased 
only in the asymptotic limit. In the case at hand, the asymptotic 
limit is approached when the total number of counts becomes 
large. The results reported in Table[T]show that: 1) both ML es- 
timators are biased; 2) both estimators are asymptotically unbi- 
ased; 3) the Cash estimator tends to the true value more quickly 
than the one. 

As we have just pointed out, the is significantly more 
biased than the Cash estimator (i.e. the difference between the 
expected and the true value is greater). This is because the ap- 
proximation of gaussian regime fails for few counts per bin. 
The obvious implication is that to improve the precision of the 
X^ estimates we need to increase the number of counts in each 
bin, A^bin- In Table |2] we compare the results obtained using the 
X^ with different channel groupings (note that using the Cash 
statistic this is not necessary). The input temperature is 7 keV, 
the exposure time is 1 000 ks and the number of measurements 
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Table 2. Weighted averages of temperature best fit values com- 
pared to the input value and relative differences AT/Tq, using 
different channel groupings. 









Ar/ro 


400 


7.00 


6.99±0.01 


-0.1% 


100 


7.00 


6.95±0.01 


-0.7% 


25 


7.00 


6.89±0.01 


-1.6% 



Notes: " counts per bin; input temperature in keV; measured tem- 
perature in keV; relative difference. 



is 300. As expected, we find that the greater is the number of 
counts in each bin, A^bin, the smaller is the bias. However, in 
practice, grouping of a large number of channels is not desir- 
able, because it causes loss of spectral information; 25 counts 
per bin is a commonly adopted compromise. We have to men- 
tion the existence of an alternative way to reduce the bias which 
aff'ec ts the estimator for few coiints per bin. Some authors 



(e.g. Churazov et al. I ll996l: iGehrelsl 119861: iKearns etal.lll995l) 



choose dififerent statistic weights {a\ in Eq. O instead of the 
standard (9i'^^. We re-analyzed our spectra using all the alter- 
native weights implemented in XSPEC and we obtain results 
rather similar to those already discussed for the Cash statistic. 

In this section we have analyzed the ideal case of a thermal 
source without a background. The results are summarized 
as follows. The standard statistic works well only in the 
Gaussian regime, which is reached when performing a strong 
channel grouping (see Table |2|i. When using a realistic group- 
ing (e.g. 25 counts per bin) the measured temperature, kT, is 
lower th an the true t emperature, kT^ (see Table [T]l- The Cash 
statistic dCash works better, because it is based on the 



ML function for Poisson processes; however, when the spectra 
total number of counts is small, kT is lower than kTo by a few 
percent (see Table [TJ. This means that the Cash ML estimator 
is only asymptotically unbiased ( for a review a bout parameter 
estir nation and ML concepts se e Cowan T998I). Many effort s 



Gehrels 


1986; .Nousek&Shue 


1989; Kearnsetal. 1995 


Churazov et al.1 19961 Ijadins & Riisaser 19961 iMishell 


1999 


Hauschild & Jentschell 20011 Beramann & Riisaseil 


2002 


Arzner et al. 


20061) have been devoted to extend to the case 



of low count spectra the standard theories about curve fitting 
(best fit parameters and confidence intervals estimation, 
goodness-of-fit test, etc.). However no definitive solution has 
been found. 



3. The source plus background case 

In this section we consider a more realistic situation by intro- 
ducing a simplified instrumental background. We model it with 
a power law (PEGPWRLW/b within XSPEC), which is convolved 
with the RMF but not multiplied by the ARE. The power law 
slope, Fb, is fixed to 0.25; the normalization, A^b, is calculated 
at the center of the energy band to minimize the correlation 
with Fb. 



There are two ways of analyzing spectra with background: 
we can subtract it using a spectrum from blank field observa- 
tions or we can model it. When modeling the background, one 
can use the whole energy band (2.0-11.3 keV rather than 2.0- 
10.0 keV as when using the subtraction) to increase the statis- 
tic. Indeed, due to the high energy sharp decrease of the ARE of 
EPIC-MOSl, beyond 10 keV the thermal component becomes 
negligible. The background subtraction using the statistic is 
a widely used technique; however, in the previous section (see 
© we showed that, for low count spectra, the Cash statistic 
is more suitable than the x^ with reasonable grouping. Since 
the Cash statistic requires the number of counts in each chan- 
nel to be gre ater than zero, the background has to be modeled 
(Cash 19791) . We shall analyze simulated spectra in both ways 
and we shall compare the results. Hereafter we call "sub-;t'^" 
the standard analysis technique and "mod-C" the analysis us- 
ing the Cash statistic and the background modeling. 

We proceed as for the source only case, considering as 
a guideline the Abell 1689 observation mentioned in ^ We 
produce a simulated spectrum choosing realistic input val- 
ues for an absorbed thermal (see © plus background (see 
above) model (WABS*MEKAL+PEGPWRLW/b in XSPEC) and we 
produce A^meas different measurements with a poissonian per- 
turbation of the number of counts in each channel. In the 
mod-C case, each spectrum is associated to the RMF and 
the ARE and is fitted between 2.0 and 11.3 keV with the 
WABS"MEKAL+PEGPWRLW/b model. The A^h and Fb are fixed to 
the input values, z is allowed to vary between 0.186 and 0.214 
(±7% of the input value), kT, Z, A^s and A^b are free. We deter- 
mine best fit values and one sigma uncertainties for all param- 
eters. Finally, we compute the weighted average of all A^meas 
values of each parameter using one sigma uncertainties and we 
compare it with the input value. In the sub-;^'^ case, we sim- 
ulate a background only spectrum with a long exposure time. 
We consider a PEGPWRLW/b model (slope and normalization 
are equal to those of the power law in the source observation 
mentioned above) and we perturb the expected spectrum as ex- 
plained above. The adopted background spectrum is the same 
for all A^meas measurements and its exposure time is 1 000 ks. 
We group the channels of each of the A^meas source spectra to 
have at least 25 counts per bin and we associate the background 
spectrum, the RMF and the ARE to the binned spectrum. We 
fit the net spectrum with a thermal model only (WABS*MEKAL 
in XSPEC) in the 2.0-10.0 keV band to determine the best fit 
values, we compute the weighted averages and compare them 
with the input values. 

We consider two spatial regions: the ring between 1.0' and 
1.5' centered on the cluster emission peak, where the source 
dominates over the background (see Fig. [T^) and the ring be- 
tween 4.5' and 6.0', where the background dominates (see 
Fig. [TJ)). Input values for the normalizations of both compo- 
nents are the best fit values measured in the two rings of the 
Abell 1689 observation mentioned in ^ More specifically, in 
the inner ring A^s = 3.5 x 10"^ and A^b = 15 and in the outer 
ring A^s = 7.0 x lO"-* and A^b = 17.5 (XSPEC units). For 
each ring we consider three input temperatures (namely 5, 7 
and 9 keV) and two exposure times (10 and 100 ks). 
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INNER RING OUTER RING 




5 10 5 10 

channel energy (keV) channel energy (keV) 



Fig. 1. Simulated spectra accumulated in an inner ring (left panel. Fig. la) and in an outer ring (right panel, Fig. lb). The solid 
line is the source contribution, the dashed line is the background and the dotted is the sum. In the outer ring, beyond 3 keV, 
background counts dominate over source counts. See text for further details and for model parameters. 

Table 3. Comparison between the results obtained using the sub-;^'^ and the mod-C data analysis techniques. 









sub 






mod-C 


Ring 


Exp. " 




kT " 


at/To 


d 




at/To 


1.0'-1.5' 


100 


5.00 


4.84±0.01 


-3.2 ' 




4.96±0.01 


-0.8% 


1.0'-1.5' 


100 


7.00 


6.78±0.02 


-3.1 ' 


7o 


6.97±0.02 


-0.4% 


1.0'-1.5' 


100 


9.00 


8.69±0.02 


-3.4' 


Jo 


8.97±0.03 


-0.3% 


1.0'-1.5' 


10 


5.00 


4.81±0.03 


-3.8' 


% 


4.82±0.03 


-3.6% 


1.0' -1.5' 


10 


7.00 


6.78+0.05 


-3.1 ' 


7o 


6.79±0.05 


-3.0% 


1.0'- 1.5' 


10 


9.00 


8.68±0.11 


-3.6' 


7c 


8.62±0.08 


-4.2% 


4.5'-6.0' 


100 


5.00 


3.95±0.01 


-21.0 ' 


7o 


4.71+0.02 


-5.8% 


4.5'-6.0' 


100 


7.00 


5.24±0.02 


-25.1 ' 


7o 


6.45+0.03 


-7.9% 


4.5'-6.0' 


100 


9.00 


6.43±0.02 


-28.6 ' 


7c 


8.09+0.04 


-10.1% 


4.5'-6.0' 


10 


5.00 


3.02±0.03 


-39.6 ' 


7c 


3.20+0.03 


-36.0% 


4.5'-6.0' 


10 


7.00 


3.68±0.04 


-47.4 ' 


7c 


3.94+0.04 


-43.7% 


4.5'-6.0' 


10 


9.00 


4. 11 ±0.05 


-54.3 ' 


7c 


4.52+0.06 


-49.8% 



Notes: " exposure time in kiloseconds; ^ input temperature in keV; measured temperature in keV; relative difference. 



In Table |3] we show the comparison between the two differ- 
ent data analysis techniques described above (i.e. sub-;^'^ and 
mod-C). First we consider the inner ring, where the source 
dominates over the background. The results are very similar to 
the case without background (see ^ Table[T]l. For the 100 ks 
case mod-C returns the correct temperature and suh-x^ slightly 
underestimates it. For shorter exposure times both techniques 
return a slightly biased value (bias a!3%). No significant trend 
with the input temperature, kTo, is found. When considering 
the outer ring, where the background dominates, we find dif- 
ferent results: in all cases the true temperature is strongly un- 
derestimated. There is a clear trend with the input temperature: 
the higher kTo, the stronger the bias. For long exposure times, 
mod-C (bias s;10%) works better than sub-;^^- (bias ^30%). 
Conversely, for short exposure times both techniques underes- 
timate the true temperature by a factor of about 2. These results 
are qualitatively similar to those found for the source only case 
(see Table [1), but the bias is much stronger. 

We have repeated the same analysis described above for a 
particular set of simulated spectra (namely in the outer ring. 



with exposure time of 10 ks and kT{) - 7 keV) modeling 
the source with a bremsstrahlung rather than a MEKAL. The 
bremsstrahlung model is simpler and can be expressed as an 
analytic function of its two free parameters (i.e. the tempera- 
ture and the normalization). Conversely, the MEKAL model has 
two further parameters (metallicity and redshift) and its com- 
plicated dependency on the parameters is not expressed in an 
analytic form: the expected values are tabulated on a finite grid 
as a function of all parameters. For this particular set of spec- 
tra, the bias for MEKAL and bremsstrahlung models is the same. 
This suggests that the bias is not related to the approximation 
of a finite grid of values. 

Some insight on the origin of the bias can be gained by 
inspecting the probability density function (p.d.f.) of the pa- 
rameter of interest (in this case the temperature). Here we deal 
with the Cash statistic, similar considerations apply to the x^- 
For each measurement, we define as the absolute mini- 
mum value of the function C{a). As in the previous section 
(see © we minimize C{a) (see Eq. |2]i to determine our best 
estimate, abest, of the parameter set (Cmin = C(Qfbest))- Cash 
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Tx [keV] kT [keV] 

Fig. 2. A cumulative distribution function (at left) and the associated probability density function (at right). 




kT [keV] kT [keV] 

Fig. 3. A comparison between representative p{T) for single measurements extracted randomly in different conditions. The 
attention should be focused on the shapes of the p.d.f., rather than on temperature values. Top panels: the inner ring, where 
the source dominates over the background. Bottom panels: the outer ring, where the background is dominant. For left panels 
the exposure time is 100 ks, for right panels 10 ks. The input temperature is always 7 keV. Note that the scales in ordinate 
are different. Clearly the curves become less symmetric and less similar to gaussians, as the exposure time decreases and the 
background contribution increases. 



(11979^ showed that the function AC (i.e. C-Cmin) follows 
distribution, therefore the confidence intervals can be generated 
in a standard way (e.g. using the XSPEC command ERROR). 
With the XSPEC command STEPPAR we produce the func- 
tion C for each free parameter (here we consider only the tem- 
perature). We calculate AC(r) and, since it is distributed, 
we can associate to each temperature, Tx, the probability that 



the true value is less or equal to Tx, i.e. the cumulative distri- 
bution function (c.d.f.) of the temperature, P{Tx)- Given that 
PiTx) - Jq^ p(T) dT, we can derive the p.d.f. of the tempera- 
ture, p(T), for each single measurement (see Fig.|2]i. In Fig. [3] 
we compare representative p{T) for single measurements in 
different conditions. For each case the p(T) is chosen randomly 
from the A^meas different measurements; therefore, the attention 
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should be focused on the shapes of the p.d.f., rather than on 
temperature values. Clearly the curves become less symmetric 
and less similar to gaussians, as the exposure time decreases 
and the background contribution increases. The input temper- 
ature also plays a role: the higher is the temperature, the less 
symmetric is the curve. Summarizing, the poorer the statistical 
quality of the data, the more asymmetric the piT), the stronger 
the bias. 

The way measurements are combined does not change the 
result. We have experimented with two different methods: the 
weighted average of individual measurements and the prod- 
uct of individual p.d.f.. The weighted average roughly approx- 
imates the p(T) to a gaussian function and implies the neglec- 
tion of the contribution of high temperature tails. A more ap- 
propriate way to join informations from different and indepen- 
dent measurements is to multiply single p.d.f.. The best value 
for the parameter corresponds to the maximum of the joined 
p.d.f.. We multiply all A^nieas p.d.f., computed as explained 
above, and we still find a discrepancy between best fit and true 
values. The bias is only slightly weaker than when computing 
a weighted average. We have also tried computing the p(T) in 
a different way, i .e. using the parametric bootstrap technique 
(IPress et al.lll992h . which consists in creating and analyzing a 
large number of fake datasets starting from model best fit val- 
ues. We obtain essentially the same results. 

In Table [3] we have showed that the strength of the bias 
mainly depends on the total number of counts and on the back- 
ground contribution. A possibility to increase the total counts 
is to extend the band to lower energies. We have explored it 
analyzing one of our set of simulated spectra (namely in the 
outer ring, with exposure time of 10 ks and kTo = 7 keV) be- 
tween 0.5 and 11.3 keV. In this energy band the correlation 
between kT and A^s is weaker and the uncertainty on both pa- 
rameters for a single measurement is smaller. Using the mod- 
C technique, the bias in the broad band is smaller (^10% vs. 
^40%) than in the narrow band; this suggests that also the pa- 
rameter degeneracy could play an important role when fitting 
in the 2.0-11.3 keV band. However, in practice, it is not use- 
ful to enlarge the band to lower energies. The main reasons are 
the imperfect calibration of EPIC instruments and the presence 
of the galactic X-ray background (negligible beyond 2 keV), 
which introduce systematic effects that are hard to take into 
account. Moreover, broadband spectra are substantially more 
contaminated by emission from low temperature components 
loca ted on the same hne of sight as the domin ant component 
(see lMazzotta et alj|2004l : lvikhlinin et"al]|2005h . 



In this section we have analyzed the realistic case of a ther- 
mal source with a background. In suc h conditions a stronger 



bias is expected jEadie et al.l 1197 It iBergmann & Riisager 



20021) . As in the source only case (see §2]), we find that: 1) 
the^^ and the Cash estimators are strongly biased; 2) the Cash 
estimator is less biased than the one, especially for long ex- 
posure times. The strength of the bias depends mainly on two 
factors: the total number of counts and the background contri- 
bution. 



4. Different attempts to correct the bias 

Having established that neither sub-;^^ nor mod-C return ac- 
ceptable results, we are faced with two alternative ways to pro- 
ceed. A long term solution to the problem requires that an un- 
biased, or perhaps a less biased ML estimator, be found and 
implemented within standard fitting packages (i.e. XSPEC). 
Another, faster, solution involves correcting the bias a poste- 
riori making use of extensive montecarlo simulations. In the 
following subsections we show our main results obtained ex- 
ploring both approaches. 



4.1. Using different estimators 



From the literature (e.g. ICowanll 1 998h we know that even if X 
is an unbiased estimator of X, f(X) is not necessary an unbi- 
ased estimator of f(X). Reversing the argument one can argue 
that if r is a biased estimator of T, there may exist a trans- 
formation, /, such that f(T) is an unbiased (or at least less bi- 
ased) estimator of f(T). To test this idea we define in XSPEC 
(using the MDEFINE command) an analytic model similar to 
bremsstrahlung, which we dub BREM2: 



St(E) =NsE 



-4/3 7—1/2 



exp 



(-!)■ 



(4) 



where the energy, E, is expressed in keV; the normalization, 
A^s, is chosen to reproduce the same flux as a MEKAL with no 
metals. 

We simulate 3 sets of 3 000 thermal plus background 
(BREM2+PEGPWRLW/b) spectra with the following input param- 
eters: A^s = 7.2 X 10-^ A^B = 17.5 and T = 7 keV. These pa- 
rameters correspond to those adopted in the case of the outer 
region (see ®. Each set has a different exposure time: 10, 20 
and 100 ks. We define 3 different estimators of the temperature: 

c = . 

and their respective models: 
Sa(E)^Ns E-*'^A^'^exp(-AE) , 
Sb{E) = A^s E-'^'^ Bexp[-B^E) , 
5 c(£) = A^s E-"^'^ exp (-C^ e) . 

For simplicity we have considered only power laws as different 
fiT). We fit each set of spectra with these models and mea- 
sure the best estimate of /(T). We compute the weighted aver- 
age of the 3 000 fiT) and calculate T using the inverse func- 
tion. In Table |4] we report the results of this analysis for 
different exposure times and estimators. The choice of the es- 
timator strongly affects the bias. When using T as estimator, 
we obtain results very similar to those obtained with a MEKAL. 
This is expected, because the model BREM2 is very similar to 
a bremsstrahlung (see Eq. lU; note also that in ^we showed 
that the bias is roughly the same when considering a MEKAL 
or a bremsstrahlung. When considering the bias as a function 
of the power law index, we find a minimum corresponding to 
r^'^^, which is the best estimator among those considered. For 
short exposure times (i.e. 10 ks) the use of r^'''^ instead of T 
reduces the bias by a factor of 4. We suggest that this could be 
related to the degree of complexity of the derivative of S(E) 
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Table 4. Comparison between the results obtained using different estimators of the temperature. The input temperature is 7 keV. 





100 ks 


20 ks 




10 ks 


Est. " 


kT ^ 


^T|To 


kT ^ 


^T|To 


kT ^ 


Ar/ro ■= 


T 


6.44±0.03 


-8.0% 


4.96±0.05 


-29.1% 


4.04±0.07 


-42.3% 


r-' 


6.96±0.03 


-0.6% 


7.46±0.08 


+6.6% 


8.83±0.14 


+26.1% 


7-1/2 


6.88±0.03 


-1.7% 


6.59±0.06 


-5.9% 


6.36±0.09 


-9.1% 


7-1/4 


6.80±0.03 


-2.9% 


6.24±0.06 


-10.9% 


5.72±0.08 


-18.3% 



Notes: " temperature estimator; ^ measured temperature in keV; relative difference. 



with respect to the estimator Note also that, when slightly in- 
creasing the statistic (e.g. when considering 20 ks of exposure 
time), the bias associated to the r^'^- estimator is almost negli- 
gible if compared to typical statistic uncertainties. These results 
encourage the exploration of this approach (i.e. to consider dif- 
ferent estimators) in order to find a rigorously derived unbiased 
estimator of the temperature. 

4.2. Fitting witli a log-normal function 

The shape of the p{T) resembles the log-normal function, 
which is the p.d.f. of any random variable whose logarithm is 
normally distributed. If X is a random variable with a normal 
distribution, then x = exp(X) has a log-normal distribution. The 
log-normal distribution has p.d.f. 

fix; 0-) = e-^'" ■-'^>^^2.' 

xcr yln 

for X > Q, where fi and cr are respectively the mean and the 
standard deviation of the variable's logarithm. The expected 
value is 

E(X) = e''^"'''^ (6) 
and the variance is 

var(X) = (e"^' - 1) e^'^^'^. (7) 

We fit each p(T) with a log-normal function, f(x; fj., cr) (see 
Eq.|5]l, and we calculate the best values of yu, and cr,-. We com- 
pute a weighted average of yu,- using crv^ as weights and calcu- 
late the expected value (see Eq.|6ll and the uncertainty, i.e. the 
variance (see Eq.|7]i divided by the square root of the number of 
measurements. The results are reported in Table|5] In all cases, 
this method provides better results than a simple weighted av- 
erage (see Table [3]for a comparison). There is still a bias of a 
few percent, except for the case of the outer ring with 10 ks: 
in this case the bias is greater than 10%. Thus, when the back- 
ground contribution is small the log-normal distribution pro- 
vides a good estimate, while when the background is domi- 
nant the result is still biased, especially for few total counts, 
but much less than when using the standard techniques. 

4.3. A semi-empirical method: summing three 
distributions 

The three EPIC instruments (MOSl, MOS2 and pn) on board 
XMM-Newton provide three simultaneous and independent 



Table 5. Results obtained fitting p(T) with a log-normal distri- 
bution. 



Ring 


Exp. " 


^7-0 " 


kT " 


AT /To 


1.0'-1.5' 


100 


5.00 


4.96±0.01 


-0.8 % 


1.0'-1.5' 


100 


7.00 


6.97±0.01 


-0.4 % 


1.0'-1.5' 


100 


9.00 


8.97±0.01 


-0.3 % 


1.0'-1.5' 


10 


5.00 


4.88±0.02 


-2.4 % 


1.0' -1.5' 


10 


7.00 


6.90±0.05 


-1.4% 


1.0'- 1.5' 


10 


9.00 


8.81+0.13 


-2.1 % 


4.5'-6.0' 


100 


5.00 


4.90+0.02 


-2.0 % 


4.5'-6.0' 


100 


7.00 


6.77+0.04 


-3.3 % 


4.5'-6.0' 


100 


9.00 


8.51+0.09 


-5.4 % 


4.5'-6.0' 


10 


5.00 


4.68+0.13 


-6.4 % 


4.5'-6.0' 


10 


7.00 


5.90+0.24 


-15.7 % 


4.5'-6.0' 


10 


9.00 


7.67+0.51 


-14.8 % 



Notes: " exposure time in kiloseconds; input temperature in keV; 
measured temperature in keV; relative dilference. 



measurements of the same target; therefore, when dealing with 
EPIC data, one has the necessity of correctly combining these 
three measurements. A weighted average is the simplest proce- 
dure, however in ^we showed that it leads to biased results. In 
^we also showed that the strength of the bias is related to the 
shape of the p.d.f. and in ^4.2| we showed that a fit with a log- 
normal function does not return sufficiently accurate results; in 
this section we try to proceed in a different way, emphasizing 
the contribution of p.d.f. tails at high temperatures. We derive 
A^meas measurements of the temperature with their correspond- 
ing p.d.f., as in the mod-C case described in |3] We divide the 
A^meas measurements in groups of three and, for each group, we 
consider the three p.d.f., /?i(r) (i=l,2,3), and combine them in a 
non-standard way: we calculate the sum, rather than the prod- 
uct, of the single p.d.f.. In practice, it is equivalent, but more 
useful, to sum directly the c.d.f., Pi(Tx)- The sum is renormal- 
ized dividing it by 3. We define P^umiTx) as follows: 

I 3 

^'sum(rx)=TXi^i(^x). (8) 

/=1 

This is a sort of joined c.d.f. of three measurements and the as- 
sociated p.d.f. is usually more symmetric than the single pi(T). 
We define as T , and the temperatures which corre- 
spond to a probability, P,um(Tx), of 0.1587, 0.5000 and 0.8413 
respectively (see Fig. |4]i. We consider Tm as the best estimate 
for the three joined measurements, dr^ = (Tm-T^)/ VSasthe 
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Fig. 4. Visual representation of the definition of T , Tyi and 
from a joined cumulative distribution function. 

Table 6. Results obtained with the semi-empirical "triplet" 
method. 



Ring 


Exp. ^ 




kT" 


Ar/ro 


1.0'-1.5' 


100 


5.00 


4.97±0.01 


-0.6 % 


1.0'-1.5' 


100 


7.00 


6.97±0.02 


-0.4 % 


1.0'-1.5' 


100 


9.00 


8.98±0.03 


-0.2 % 


1.0'-1.5' 


10 


5.00 


4.90±0.04 


-2.0 % 


1.0'-1.5' 


10 


7.00 


6.94±0.07 


-0.9 % 


1.0'-1.5' 


10 


9.00 


8.98±0.11 


-0.2 % 


4.5'-6.0' 


100 


5.00 


5.00±0.02 


-0.0 % 


4.5'-6.0' 


100 


7.00 


6.90±0.04 


-1.4% 


4.5'-6.0' 


100 


9.00 


8.91±0.05 


-1.0% 


4.5'-6.0' 


10 


5.00 


5.04±0.06 


+0.8 % 


4.5'-6.0' 


10 


7.00 


6.97±0.09 


-0.4 % 


4.5'-6.0' 


10 


9.00 


8.96±0.13 


-0.4 % 



Notes: " exposure time in kiloseconds; input temperature in keV; 
measured temperature in keV; relative difference. 



lower uncertainty and dr^ = (T^-Tm)/ V3 as the upper uncer- 
tainty. Thus we have A^meas/3 "triplet" measurements: Tm^^'^-- 
We compute the weighted average of the A^meas/3 "triplets" and 
in Table |6] we compare the results with the input values. In al- 
most all cases this semi-empirical method (hereafter "triplet" 
method) provides excellent results: the discrepancy is lower 
than 2% and often comparable with the statistical uncertainty. 

We have tried joining different numbers of measurements 
together; simulations show that, when considering two mea- 
surements at a time, we find the temperature to be underes- 
timated, when considering five measurements, we obtain sub- 
stantially correct results, as when using the "triplets". This sug- 
gest that the effectiveness of our a posteriori correction depends 
on the number of measurements we combine. We propose to 
use the "triplets" because three is the minimum number of mea- 
surements for which we obtain unbiased temperature estimates 
and because this is a natural choice when analyzing EPIC data. 

We want to stress that this technique is not rigorously de- 
rived from statistics principles, but we have showed that it is the 



only method that returns the expected temperature under very 
different situations (e.g. different background contributions and 
exposure times). This could be related to the fact that joined 
p.d.f. are usually more symmetric than the single ones. 



5. Summary 



It is well kn own from the literature (e.g lEadie et al.lll971l or 
Cowanll 19981) that ML estimators are generally biased and that 
they are gaussian and unbiased only in the asymptotic limit. In 
this paper we test the effects of statistical fluctuations in deter- 
mining the temperature from a thermal spectrum. In particular, 
we explore a range of conditions for which ML estimators re- 
veal their intrinsic bias. 

In the source only case we show that: 

1 . the estimators of the temperature based on the Cash and the 

statistics are biased for short exposure times, i.e. for few 
counts; 

2. the Cash statistic works better than the;^'^, as pointed out 
bv lCashl(ll979h : 

3. the statistic works as well as the Cash, when strongly 
increasing channel grouping. 

In the source plus background case we show that: 

1 . the standard analysis techniques (sub-;^'^ and mod-C) return 
heavily biased results; 

2. the strength of the bias depends mainly on the total number 
of counts and on the background contribution; 

3. the use of different estimators of the temperature, in partic- 
ular r^'^^, strongly reduces the bias. 

The last point is encouraging in order to find a long term so- 
lution of the problem (i.e. a rigorously derived unbiased esti- 
mator). As an alternative and immediate solution we propose 
the so-called "triplet" method, which makes use of a correction 
of the bias a posteriori, working on the probability distribution 
functions. This semi-empirical recipe returns the correct result 
under very different situations, even though it is not rigorously 
derived from statistics principles. 

We point out that our results can have strong implications 
for the measurement of the temperature from spectra accumu- 
lated from low surface brightness regions (e.g. the outer re- 
gions of galaxy clusters) with current ex periments, i.e. XMM- 



Newton and Chandra (see for example lPiffarettietal.1 120051 



Vikhlinin et al. 2005; Pratt et al. 2007). Indeed the bias seems 
to be related to the statistical quality of the data, which typi- 
cally depends on the distance of a given region from the core. 
For this reason we might expect a net effect on radial tem- 
perature profiles of galaxy clusters. In a forthcoming paper 
(ILeccardi & Molendill2007b we will analyze a sample of clus- 
ters to determine the mean temperature profile using our data 
analysis technique. 
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